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We propose a general framework to describe the impact of different events in the order book, 

that generalizes previous work on the impact of market orders. Two different modeling routes can 

be considered, which are equivalent when only market orders are taken into account. One model 

posits that each event type has a temporary impact (TIM). The "history dependent impact" model 

(HDfM), on the other hand, assumes that only price-changing events have a direct impact, itself 

modulated by the past history of all events through an "influence matrix" that measures how much, 

on average, an event of a given type affects the immediate impact of a price- changing event of the 

same sign in the future. We find in particular that aggressive market orders tend to reduce the 

impact of further aggressive market orders of the same sign (and increase the impact of aggressive 

("■^ , market orders of opposite sign). We discuss the relative merits of TIM and HDIM, in particular 

^vj . concerning their ability to reproduce accurately the price diffusion pattern. We flnd that in spite 

of theoretical inconsistencies, TIM appears to fare better than HDIM when compared to empirical 

I— { I data. We ascribe this paradox to an uncontrolled approximation used to calibrate HDIMs, calling 

^-i . for further work on this issue. 
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1. INTRODUCTION 

The relation between order flow and price changes has attracted considerable attention in the recent years [iHal. 
Most empirical studies to date have focused on the impact of (buy/sell) market orders. Many interesting results have 
been obtained, such as the very weak dependence of impact on the volume of the market order, the long-range nature 
of the sign of the trades, and the resulting non-permanent, power-law decay of market order impact with time (see 
Ref. [6|). However, this representation of impact is incomplete in at least two ways. First, the impact of all market 
orders is usually treated on the same footing, or with a weak dependence on volume, whereas some market orders are 



"aggressive" and immediately change the price, while other are more passive and only have a delayed impact on the 
price. Second, other types of order book events (limit orders, cancellations) must also directly impact prices: adding 
a buy limit order induces extra upwards pressure, and cancelling a buy limit order decreases this pressure. Within a 
description based on market orders only, the impact of limit orders and cancellations is included in an indirect way, in 
fact as an effectively decaying impact of market orders. This decay reflects the "liquidity refill" mechanism explained 
in detail in Refs. [J, l6|-|9|, whereby market order trigger a counterbalancing flow of limit orders. 

A framework allowing one to analyze the impact of all order book events, and to understand in detail the statistical 
properties of price time series, is clearly desirable. Surprisingly, however, there are only very few quantitative studies 
of the impact of limit orders and cancellations [l3, [ill - partly due to the fact that more detailed data, beyond 
trades and quotes, is often needed to conduct such studies. The aim of the present paper is to provide a theoretical 
framework for the impact of all order book events, which allows one to build statistical models with intuitive and 



transparent interpretations 11| . 



2. A SHORT SUMMARY OF MARKET ORDER IMPACT MODELS 

A simple model relating prices to trades posits that the mid-point price pt just before trade t can be written as a 
linear superposition of the of the impact of all past trades 0, 01 '■ 



pt = y^[Git~t')^t'+Vt']+P-o., ^t=etvf (1) 



E 

t'<t 



where Vt' is the volume of the trade at time <', e^ the sign of that trade (+ for a buy, — for a sell), and rjt is an 
independent noise term that models any price change not induced by trades (e.g. jumps due to news). The exponent 
9 is found to be small. The most important object in the above equation is the function G{£) describing the temporal 
evolution of the impact of a single trade, which can be called a 'propagator': how does the impact of the trade at time 
t' = t — i propagate, on average, up to time tl Because the signs of trades are strongly auto-correlated, G{i) must 
decay with time in a very specific way, in order maintain the (statistical) efficiency of prices. Clearly, if G(£) did not 
decay at all, the returns^ rt = pt+i — Pt would simply be proportional to the sign of the trades, and therefore would 
themselves be strongly autocorrelated in time. The resulting price dynamics would then be highly predictable, which 
is not realistic. The result of Ref. Q is that if the correlation of signs C{£) = {etet+i) decays at large I as l!.~^ with 
7 < 1 (as found empirically), then G(£) must decay as l~^ with (3 ~ {1 ~ 7)/2 for the price to be exactly diffusive at 
long times. The impact of single trades is therefore predicted to decay as a power-law (at least up to a certain time 
scale) . 

The above model can be rewritten in a completely equivalent way in terms of returns, with a slightly different 
interpretation [3, Q: 

rt=Gmt + Y,K{t-t')^t'+riu K{£) = G{i+l)-G{£). (2) 

t'<t 

This can be read as saying that the t*'' trade has a permanent impact on the price, but this impact is history dependent 
and depends on the sequence of past trades. The fact that G decays with £ implies that the kernel k is negative, 
and therefore that a past sequence of buy trades {^t'<t > 0) tends to reduce the impact of a further buy trade, but 
increase the impact of a sell trade. This is again a consequence of the dynamical nature of liquidity: when trades 
persist in a given direction, opposing limit orders tend to pile up and reduce the average impact of the next trade in 
the same direction. For the price to be an exact martingale, the quantity ^t = — ^^i^i n{i — t')S,t' must be equal to 

the conditional expectation of ^t at time t^ , such that ^t — S.t is the surprise part of S,t- This condition allows one to 
recover the above mentioned decay of G{£) at large i. 

In order to calibrate the model, one can use the empirically observable impact function TZ{£), defined as: 

TZ{£) = {{pt+e~Pt)-^t), (3) 

and the time correlation function G{i) of the variable ^j = etvf to map out, numerically, the complete shape of G{i). 
This was done in Ref. [J|, using the exact relation: 

TZ(i)= Y^ G{n)C{£-n) + Y[G{n + i)-G{n)]G{n). (4) 

0<n<e n>0 



^ In the following, wc only focus on price changes over small periods of time, so that an additive model is adequate. 



Alternatively, one can use the 'return' version of the model, Eq. ^, which gives: 

that in turn leads to 

S{e) = G{0)C{e) + J2 n{i + n)Cin). 

n>-l 



(5) 
(6) 



As noted in Ref. [12| , this second implementation is in fact much less sensitive to finite size effects and therefore more 
adapted to data analysis.^ 

The above model, regardless of the type of fitting, is approximate and incomplete in two interrelated ways. First, 
Eqs. ^ and ([2]) neglect the fluctuations of the impact: one expects in general that G and k should depend both on t 
and t' and not only on £ = t — t' . Impact can indeed be quite different depending on the state of the order book and 
the market conditions at t' . As a consequence, if one blindly uses Eqs. ([T]) and ^ to compute the second moment of 
the price difference, D{£) = {{pt+e ~ Pt)"^), with a non-fluctuating G{i) calibrated to reproduce the impact function 
TZ{i), the result clearly underestimates the empirical price variance: see Fig. [TJ 
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Figure 1: D{£)/£ and its approximation with the transient impact model (TIM) with only trades as events, with r]t = and 
for small tick stocks. Results are shown when assuming that all trades have the same, non fluctuating impact G{£), calibrated 
to reproduce TZ{£). This simple model accounts for ~ 2/3 of the long term volatility. Other events and/or the fluctuations of 
G{£) must therefore contribute to the market volatility as well. 

Adding a diffusive noise rjt ^Q would only shift D{£)/£ upwards, but this is insufficient to reproduce the empirical 
data. Second, as noted in the introduction, other events of the order book can also change the mid-price, such as 
limit orders placed inside the bid-ask spread, or cancellations of all the volume at the bid or the ask. These events do 
indeed contribute to the price volatility and should be explicitly included in the description. A simplified description 
of price changes in terms of market orders only attempts to describe other events of the order book in an effective 
way, through the non-trivial time dependence of G{£). 

In the following, we will generalize the above model to account for the impact of other types of events, beyond 
market orders. In this case, however, it will become apparent that the two versions of the above model are no longer 
equivalent, and lead to different quantitative results. Our main objective will be to come up with a simple, intuitive 
model that (i) can be easily calibrated on data, (ii) reproduces as closely as possible the second moment of the price 
difference D{£), (iii) can be generalized to richer and richer data sets, where more and more events are observable and 
(iv) can in principle be systematically improved. 



^ The key difference is that a numerical solution necessarily truncates G{(.) in Eq. ((4)l and k{€} in Eq. lO at some arbitrary fmax. This 
truncation in the former case corresponds to the boundary condition of k{1 > £max) = 0, hence a fully temporary impact at long 



times, while in the latter case to k(£ > ^max) = 
consequently it is better behaved numerically. 



G(^niax), hence a partially permanent impact. The latter solution is more smooth and 
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event definition 


event sign definition 


gap definition (A-n-,e) 


7r = MO'' 


market order, volume < outstand- 
ing volume at the best 


e = ±1 for buy /sell 
market orders 





7r = MO' 


market order, volume > outstand- 
ing volume at the best 


e = ±1 for buy /sell 
market orders 


half of first gap behind the ask (e = 1) 
or bid (e= -1) 


^ = CA» 


partial cancellation of the bid/ask 
queue 


e = =pl for buy /sell 
side cancellation 





^ = LO0 


limit order at the current best 
bid/ask 


e = ±1 for buy /sell 
limit orders 





TT^CA' 


complete cancellation of the best 
bid/ask 


e = =pl for buy /sell 
side cancellation 


half of first gap behind the ask (e = 1) 
or bid (e = -1) 


TT = LO' 


limit order inside the spread 


e = ±1 for buy /sell 
limit order 


half distance of limit order from the 
earlier best quote on the same side 



Table I: Summary of the 6 possible event types, the corresponding definitions of the event signs and gaps. 

3. MANY-EVENT IMPACT MODELS 
3.1. Notations and definitions 



The dynamics of the order book is rich and complex, and involves the intertwined arrival of many types of events. 
These events can be categorized in different, more or less natural types. In the empirical analysis presented belovif, 
we have considered the following six types of events: 

• market orders that do not change the best price (noted MO ) or that do change the price (noted MO'), 

• limit orders at the current bid or ask (LO") or inside the bid-ask spread so that they change the price (LO'), 

• and cancellations at the bid or ask that do not remove all the volume quoted there (CA ) or that do (CA'). 

Of course, events deeper in the order book could also be added, as well as any extra division of the above types into 
subtypes, if more information is available. For example, if the identity code of each order is available, one can classify 
each event according to its origin, as was done in Rcf. |13| . The generic notation for an event type occurring at event 



time t will be ttj. The upper index ' ("prime") will denote that the event changed any of the best prices, and the upper 
index that it did not. Abbreviations without the upper index (MO, CA, LO) refer to both the price changing and 
the non-price changing event type. Every event is given a sign et according to its expected long-term effect on the 
price - the precise definitions are summarized in Table HI Note that the table also defines the gaps A^ e, which will 
be used later. We will rely on indicator variables denoted as /(ttj = tt). This expression is equal to 1 if the event at t 
is of type tt and zero otherwise. We also use the notation (•) to denote the time average of the quantity between the 
brackets. 

Let us now define the response of the price to different types of orders. The average behavior of price after events 
of a particular type tt defines the corresponding response function (or average impact function); 



Tl-nif) = {{pt+e -Pt) ■ etlTTt = tt) 



(7) 



This is a correlation function between etI{'Kt ~ tt) at time t and the price change from t io t A- (., normalized by the 
stationary probability of the event tt, denoted as P{ii) — {lii^t = ti"))- This normalized response function gives the 
expected directional price change after an event tt. Its behavior for all tt's is shown in Fig. [2Kleft). Tautologically, 
7?.,r(-^ = 1) > for price changing events and TZ-nii = 1) = for the others. Empirically, all types of events lead, on 
average, to a price change in the expected direction, i.e. TZTr{£) > 0. 

We will also need the "return" response function, upgrading the quantity S{£) defined above to a matrix: 



57ri,7r2(^) = {I{T^t+l = 7r2) • Tt+l ■ et\'Kt = TTi) . 



Clearly, as an exact identity: 



£-1 



^-w = EE'5-.-'(^ 



(8) 



(9) 
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Figure 2: (left): The response function 7?.,r(^) and (right): the bare impact function G,r(i') for the TIM, for the data described 
in Sec. 14.11 The curves are labeled according to vr in the legend. Note that Gw{£) is to a first approximation independent of £ 
for all tt's. However, the small variations are real and important to reproduce the correct price diffusion curve D{£). 



Similarly, the signed-event correlation function is defined as: 



C/TT 



.w 



P{TTl)P{Tr2) 



(10) 



Our convention is that the first index corresponds to the first event in chronological order. Note that in general, there 
are no reasons to expect time reversal symmetry, vifhich would impose 6*7^.^2 (^) = C'7r2.7ri (^)- If one has N event types, 
altogether there are N'^ of these event-event correlation and return response functions. 



3.2. The transient impact model (TIM) 

Lot us write down the natural generalization of the above transient impact model (TIM), embodied by Eq. ([1]), to 
the many-event case. We now envisage that each event type tt has a "bare" time dependent impact given by G^(^), 
such that the price reads: 



pt = Yl ['^< (* ~ *')^*' + '^*'] + p-= 



(11) 



t'<t 



where one selects for each t' , the propagator G^r' corresponding to the particular event type at that time. After 
straightforward calculations, the response function (O can be expressed as 



n^,{£ + l)-TZ^,{£) = yP{TT2) 



GnAO)C^i.n,{e) + }] [G^M + n + l)-G^,ie + n)]C^,^^,{n) 



n>-i 



(12) 



This is a direct extension of Eq. @. One can invert the system of equations P^ . to evaluate the unobservable G^'s 
in terms of the observable i^^r's and G^j_7r2's - see Fig. [2Jright). Note that we formulate the problem here in terms of 
the "derivatives" of the Gtt's since, as mentioned above, it is numerically much more stable to introduce new variables 
for the increments of 7^ and G and solve (fT2)) in terms of those. 

Once this is known, one can explicitly compute the lag dependent diffusion constant D{£) = ({pt+i ^ VtY) in terms 
of the G's and the G's, generalizing the corresponding result obtained in Rcf. Q: see Appendix VK\ 



3.3. The history-dependent impact model (HDIM) 

Now, if one wants to generalize the history dependent impact model (HDIM), Eq. (I2|), to the many-event case, one 
immediately realizes that the impact at time t depends on the type of event one is considering. For one thing, the 



instantaneous impact of price non-changing events is trivially zero. For price changing events, tt' = MO',LO',CA', 
the instantaneous impact is given by: 

n = etAt. (13) 

Here A depends on the type of price changing event tt' that happens then, and possibly also on the sign of that event. 
For example, if ttj = MO and ej = — 1 this means that at a sell market order executed the total volume at the bid. 
The midquote price change is —At, which usually means that the second best level was at bt — 2 At, where ht is the 
bid price before the event. The factor 2 is necessary, because the ask did not change, and the impact is defined by 
the change of the midquote. Hence A's for MO's (and similarly CA''s) correspond to half of the gap between the first 
and the second best quote just before the level was removed (see also Ref. [IJ]). Another example when tt = LO' 
and et = —1. This means that at i a sell limit order was placed inside the spread. The midquote price change is —A, 
which means that the limit order was placed at at — 2At, where at is the ask price. Thus A for LO''s correspond to 
half of the gap between the first and the second best quote right after the limit order was placed. In the following 
we will call the A's gaps. For large tick stocks, these non-zero A's are most of the time equal to half a tick, and only 
very weakly fluctuating. For small tick stocks, substantial fluctuations of the spread, and of the gaps behind the best 
quotes, can take place. The generalization of Eq. 1^ precisely attempts to capture these gap fluctuations, which are 
affected by the flow of past events. If we assume that the whole dynamical process is statistically invariant under 
exchanging buy orders with sell orders (e => — e) and bids with asks, the dependence on the current non zero gaps 
on the past order flow can only include an even number of e's. Therefore the lowest order model for non-zero gaps 
(including a constant term and a term quadratic in e's) is: 



I ' = A^, -H y 

ti<t 



K^i.n'it - h) [et,e - C^,,^'{t - h)] + T]'t, (14) 



where Ktti.tt' are kernels that model the dependence of the gaps on the past order flow (note that k^^.tt' is a 6 x 3 
matrix) and A^, are the average realized gaps, defined as (AtJTTt = tt'), since the average of the second term in the 
right hand side is identically zero. Note that the last term, equal to X]n>o '^■n P{'^)'^-n.T!' (")C'T,7r' ("■), was not explicitly 



included in our previous analysis, Ref. IjJ. However, typical values are less than 1% of the average realized gap, and 
therefore negligible in practice. We set it to zero in the following. 

Eq. P^ . combined with the definition of the return at time t, Eq. (fT^)) . leads to our generalization of the HDIM, 
Eq. ©: 

n = A^^et + ^ K.,,,.,(t - h)et, + ?/t. (15) 

ti<t 

It is interesting to compare the above equation with its analogue for the transient impact model, which reads (after 
Eq. dnD): 

rt = G,,(l)et + Y. [G-*i {t-h + l)- G^,^ (t - h)] et, + Vt- (16) 

ti<t 

The two models can only be equivalent if: 

K,,^^,i£)^G.,^ii+l)-G.,^ie), Vtt, (17) 

which means that the "influence matrix" k^j^^^t has a much constrained structure, which has no reason to be optimal. 
It is also a priori inconsistent since the TIM leads to a non zero price move even if tt is a non price-changing event, 
since Eq. ([T7]) is valid for all event types tt. This is a major conceptual drawback of the TIM framework (although, 
as we will see below, the model fares quite well at reproducing the price diffusion curve). 

The matrix K7ri.7r2 can in principle be determined from the empirical knowledge of the response matrices iS7ri,7r2j 
since: 

-J—^S.,Me) = 7J7^ {liTTt+i = TTs) • rt+i ■ etkt = n,) = A^,a„,,(^) (18) 

P['^2) P['^2) 

+ ^ y^ K^,7r2 (^ + ^ - ^0 {IJT^t' ^ Tr)et'I{TTt = 7Tl)et\7rt+l = 712) ■ 

t'<t+l TT 

Note however that the last term includes a three-body correlation function which is not very convenient to estimate. 
At this stage and below, we need to make some approximation to estimate higher order correlations. We assume that 



all three- and four-body correlation functions can be factorized in terms of two-body correlation functions, as if the 
variables were Gaussian. This allows to extract ^,^,^2 from a numerically convenient expression, used in |ll| : 

^""^-^ n>-i TT 

Knowing the K7ri,7r2's and using the same factorization approximation, one can finally estimate the price diffusion 
constant, given in Appendix [Xj Although the factorization approximation used to obtain the diffusion constant looks 
somewhat arbitrary, we find that it is extremely precise when applied to the diffusion curve. 

For large tick stocks, the gaps hardly vary with time and are all equal to 1 tick. In other words Ktti.tt ~ and the 
model simplifies enormously, since now G'^(^) = A^. In this limit, one therefore finds: 

R^ii) = {{pt+i - pt) ■ et\7:t = n) ^ A^' + ^ ^ A^^P(7ri)C.,., (i'), (20) 

a<t'<e TTi 

which means that the total price response to some event can be understood as its own impact (lag zero), plus the 
sum of the biases in the course of future events, conditional to this initial event. These biases are multiplied by the 
average price change A^ that these induced future events cause. Within the same model, the volatility reads: 



D{i)^{{p,+,-p,r)^ E EEP(7ri)P(^2)a,..2(i'-OA«,A^^. (21) 



For small ticks, on the other hand, gaps do fluctuate and react to the past order flow; the influence matrix K,ri,7r' 
describes how the past order flow affects the current gaps. If KT^^^Tr'i^^) is positive, it means that an event of type tti 
(price changing or not) tends to increase the gaps (i.e. reduce the liquidity) for a later price changing event tt' in the 
same direction, and decrease the gap if the sign of the event n' is opposite to that of tti . 

4. MODEL CALIBRATION AND EMPIRICAL TESTS 

4.1. Data 

We have tested the above ideas on a set of data made of 14 randomly selected liquid stocks traded on the NASDAQ 



during the period 03/03/2008 - 19/05/2008, a total of 53 trading days (see Ref. [ll| for a detailed presentation of 
these stocks and summary statistics) . In order to reduce the effects of the intraday spread and liquidity variations we 
exclude the first 30 and the last 40 minutes of the trading days. The particular choice of market is not very important, 
many of our results were also verified on other markets, such as CME Futures, US Treasury Bonds and stocks traded 
at the London Stock Exchange. 

Our sample of stocks can be divided into two groups: large tick and small tick stocks. Large tick stocks are such 
that the bid-ask spread is almost always equal to one tick, whereas small tick stocks have spreads that are typically 
a few ticks. The behavior of the two groups is quite different, for example, the events which change the best price 
have a relatively low probability for large tick stocks (about 3% altogether), but not for small tick stocks (up to 40%). 
Note that there is a number of stocks with intermediate tick sizes, which to some extent possess the characteristics of 
both groups. Technically, they can be treated in exactly the same way as small tick stocks, and all our results remain 
valid. However, for the clarity of presentation, we will not consider them explicitly. 

As explained above, we restrict ourselves to events that modify the bid or ask price, or the volume quoted at these 
prices. Events deeper in the order book are unobserved and will not be described: although they do not have an 
immediate effect on the best quotes, our description is still incomplete. Furthermore, we note that the stocks we 
are dealing with are traded on multiple platforms. This may account for some of the residual discrepancies reported 
below. 

Since we consider 6 types of events, there are 6 response functions TZt^ and propagators Gt,-, 36 correlation functions 
^7^,772- However, since the return response functions iS7ri.7r2 and the influence kernels Kt^2,t^i ^^^ ^o^ zero only when 
the second event 7r2 is a price changing event, there are only 3 x 6 = 18 of them. 

4.2. The case of large ticks 

As explained in the previous section, the case of large ticks is quite simple since the gap fluctuation term of HDIM 
can be neglected altogether. As shown in Ref. [ll|, the predictions given by Eqs. pHl) and (|21l) are in very good 




TIM 

CG 

HDIM 

HDIM-3 

HDlM-3+Dh, 



o 

tn 
o 



O 

y 



0.1 


-0.1 h 

-0.2 
-0.3 h 
-0.4 
-0.5 I- 



-0.6 




10' 10' 

I (events) 



10'' 



10" 



10' 



10' 



10" 



I (events) 



Figure 3: (left) D{£)/1 and its approximations. Crosses correspond to the data. The constant gap (CG) model corresponds 
to Gtt{£) = Gtt{1). TIM corresponds to the temporary impact model calibrated on returns. The curve for HDIM uses the 
approximate calibration of k's, HDIM-3 is taking 3 times the k's as from the calibration. We also indicate HDIM-3 with adding 
the constant Dm ~ 0.04. Note that the vertical scale is different from Fig. 1, since in the time clock is different in the two 
cases (all events vs. trades in Fig. 1). (right) Comparison of the three non zero k.mo',tt'W with their average over n' . Note 
that Kmo'.mo' < 0; after an MO' event, gaps on the same side are on average smaller. 

agreement with the empirical determination of the TZ.„{(.) and the price diffusion D{(.). Small remaining discrepancies 
can indeed be accounted for by adding the gap fluctuation contribution, of the order of a few percents. 

The temporary impact model, on the other hand, is not well adapted to describe large tick stocks, for the following 
reason: when Eq. (|12p is used to extract Gt^ (£) from the data, small numerical errors may lead to some spurious time 
dependence. But as far as D{t) is concerned, any small variation of G^ is amplified through the second term of Eq. 
(|A1[) which is an infinite sum of positive terms. As noted in Ref. ^M,^ this leads to large discrepancies between the 
predicted D[i) and its empirical determination. At any rate, one should clearly favor the calibration of Gt(^) using 
Eq. P^ rather than the analogue of Eq. ^ . 

4.3. The case of small ticks 

The case of small ticks is much more interesting, since in this case the role of gap fluctuations is crucial, and is a 
priori a stringent test for the two models on stage. 



i.3.1. TIM 



Within the temporary impact model, the response functions 7^7r(^) are tautologically accounted for, since they are 
used to calibrate the propagators GT^{t) using Eq. p^ . Once the G7r(^)'s are known (see Fig. [21 where TZyrii) and 
Gt^{(,) are shown for small tick stocks), one can compute the time dependent diffusion coefficient D(t)/i, and compare 
with empirical data. This is shown in Fig. [31[left). Note that we calibrate the GTr{i) for each stock separately, compute 
D{£)/i in each case, and then average the results over all stocks. The agreement is surprisingly good for long times, 
while for shorter times the model underestimates price fluctuations, which is expected since the model does not allow 
for high frequency fluctuations. We also show the prediction based on the constant gap approximation, K7ri,7r2 = 0. 
Although Fig. [5] suggests that this is an acceptable assumption, we see that D{£) is overestimated. As will be argued 
below, gaps do adapt to past order flow, and the net effect of the gap dynamics is to reduce the price volatility. We 
finally note that calibrating the G-^ {£) on the response functions directly (and not on their derivatives) . as was done 
in [11|, leads to much poorer results for the diffusion coefficient D{£)/£. 



4.3.2. HDIM 



We now turn to the history-dependent impact model. As explained above, we determine the influence kernels 
Ti,7r2(^) using Eq. p^ . We plot in Fig. 0] the resulting "integrated impact" on the future gaps of all 6 tti events. 



which we define as'': 



5g:(£) = ^^p(^'k,.,h, 



(22) 



As explained in Ref. [ll[ , SG* {£) captures the contribution of the gap "compressibiHty" to the impact of an event of 
type TT up to a time lag £, leaving the sequence of events unchanged. If Kt^^^t' (n) were independent of tt', as postulated 
in the TIM, one would have JG*(€) = G^(£) — G7r(l) as an identity. The agreement turns out to be excellent (see 
Fig. SI), which was not guaranteed a priori since the HDIM is calibrated on a much larger set of correlation functions. 
However, this does not mean that KTr^Tri{n) are necessarily independent of tt'. To illustrate the point that Eq. ()17|) 
is too restrictive. Fig. EJright) compares the three kmo'.tt'S, which are clearly different from one another. Note that 
the average over tt' is negative, meaning that MO' events tend to "harden" the book (i.e. after an MO' event, gaps on 
the same side are on average smaller). This is true for all price changing events, while (perhaps surprisingly) small 
market orders MO" "soften" the book: SG^^o is positive and gaps tend to grow. Queue fluctuations (CA*' and LO*^) 
seem less important, but for small ticks these types of events also harden the book. Note finally that for large ticks 
6G*'s are found to be about two orders of magnitude smaller, which confirms that gap fluctuations can be neglected 
in that case. 
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Figure 4: The integrated impact on the future gaps 5G^{£) in the HDIM estimated via Eq. (|17|) . The results are indistinguishable 
from Gtt{£) — Gtt{1) calculated for the TIM. The curves are labeled according to n in the legend. 



Now, Eq. p^ relies on the factorization of a three-point correlation function and is not exact, so there is no 
guarantee that the response functions TZ-n{t) are exactly reproduced using this calibration method. In order to check 
this approximation, we have simulated an artificial market dynamics where the price evolves according to Eq. (|15p . 



with the true (historical) sequence of signs and events and rjt = 0. The kernels K,ri,Tr2 are calibrated using Eq. 
P^ . This leads to the predictions shown as dashed lines in Fig. [SJ The agreement can be much improved by 
simply multiplying all k's by a factor 3, see Fig. [5j Of course, some discrepancies remain and one should use the 
historical simulation systematically to determine the optimal k's. This is, however, numerically much heavier and an 
improved analytical approximation of the three-point correlation function, that would allow a more accurate workable 
calibration, would be welcome. 

Finally, we computed D{1) for the HDIM using (jA2p in the Appendix. Here again, we have tested the quality of 
the factorization approximation using the same historical simulation. In this case, the D{t) curve is indistinguishable 
from its approximation, so any discrepancy between the data and formula (jA2p cannot be blamed on its approximate 
nature, but rather on an inadequate calibration of the k's. 

The result is given in Fig. [Sljleft) together with the previous theoretical predictions and the empirical data.^ With 
the naive calibration the HDIM turns out to be worse than the TIM for large lags: it overestimates D{i) by 15% or so. 
Increasing the k's by a factor 3 again greatly improves the fit but part of the discrepancy remains. For small lags, one 



^ Note that this definition is compatible with the one given in Ref. [Till , beeause of a slight change in the interpretation in the k kernels 

here. 
* The D(£)-HDIM shown here is indistinguishable from the one appearing in Fig. 16 of Ref. Illl . 
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Figure 5: TZtv^I.) and their approximation with tlie HDIM. Symbols correspond to data, they are perfectly in line with the model 
prediction under the assumption that approximation (|19|l is correct. The dashed lines correspond to the response function of an 
actual simulation of the model with the k's calibrated via Eq. (|19p . The solid lines correspond to the simulation if we increase 
all calibrated k's by a factor 3 (HDIM-3). The colors vary according to tt as shown in the legend. 



needs to add a constant contribution Dhf ~ 0.04 ticks squared to match the data.^ The HDIM produces a significant 
improvement over the constant gap model, because it explicitly includes the effect of gap fluctuations. However, since 
the calibration procedure relies on an approximation, we do not reproduce the response functions exactly. Hence the 
better founded model (HDIM) fares worse in practice than a model with theoretical inconsistencies (TIM). As noted 
above, a better calibration procedure for the k's could improve the situation. 

At any rate, numerical discrepancies should be expected regardless of the fitting procedure, since we have neglected 
several efi^ects, which must be present. These include (i) all volume dependence, (ii) unobserved events deeper in the 
book and on other platforms and (iii) higher order, non-linear contributions to model history dependence. On the 
last point, we note that based on symmetry arguments, the gap fiuctuation term may include higher order terms of 
the form: 



E 



^TTt, .TTf, ,7rt 



,;7rj/(i ^tl,t ^t2,t ^ t3)^ti^t2^t3^t' J 



(23) 



tlM,t3<t' 



or with a larger (even) number of e's. The presence of a four e term is in fact suggested by the data shown in Fig. 
13 of Ref. [11|, and also by more recent analysis 15[. It would be interesting to study these effects in detail, and 
understand their impact on price diffusion. 



5. CONCLUSION 

Let us summarize what we have tried to accomplish in the present paper. Our aim was to provide a general 
framework to describe the impact of different events in the order book, in a way that is flexible enough to deal with 
any classification of these events (provided this classification makes sense). ^ We have specifically considered market 
orders, limit orders and cancellation at the best quotes, further subdividing each category into price-changing and non 
price- changing events, giving a total of 6 types. In trying to generalize previous work, which focused on the impact of 
market orders only, we have discovered that two different models can be envisaged. These are equivalent when only a 
single event type, market orders regardless of their aggressivity, arc taken into account. One model posits that each 
event type has a temporary impact (TIM), whereas the other assumes that only price-changing events have a direct 
impact, which is itself modulated by the past history of all events, a model we called "history dependent impact" 
(HDIM). 



^ This contribution accounts for high frequency "noise" in the data that the model is not able to reproduce, as, for example, sequences of 

placement and cancellation of the same limit order inside the gap. 
® See Ref. [l3l for an application of this method to orders with brokerage codes. 
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The TIM is a natural extension of Hasbrouck's VAR model to a multi-event setting: one writes a Veetor Autoregres- 
sion model for the return at time t in terms of all signed past events, but neglects the direct influence of past returns 
themselves (although these would be easy to include if needed). We have discussed the fact that TIMs are, strictly 
speaking, inconsistent since they assign a non-zero immediate impact to non price-changing events. Still, provided the 
model is correctly calibrated using returns (see Eq. ([T2|)'). we find that the TIM framework allows one to reproduce 
the price diffusion pattern surprisingly accurately. 

The HDIM family can also be thought of as a VAR model, although one now distinguishes between different types of 
event-induced returns before regressing them on past events. The HDIM is interesting because it gives a very appealing 
interpretation of the price changing process in terms of history dependent "gaps" that determine the amplitude of 
the price jump if a certain type of price-changing event takes place. We have in particular defined a lag- dependent, 
6x3 "influence matrix" (called k^^tt' hi the text), which tells us how much, on average, an event of type tt affects the 
immediate impact of a tt' price-changing event of the same sign in the future. 

The HDIM therefore envisages the dynamics of prices as consisting of three processes: instantaneous jumps due to 
events, events inducing further events and thereby affecting the future jump probabilities (described by the correlation 
between events), and events exerting pressure on the gaps behind the best price and thereby affecting the future 
jump sizes (described by the k's). By describing this third effect with a linear regression process, we came up with 
the explicit model (jlSp . that can be calibrated on empirical data provided some factorization approximation is made 
(which unfortunately turns out not to be very accurate, calling for further work on this matter). This allows one 
to measure the influence matrix n and its lag dependence. We find in particular that price-changing events, such as 
aggressive market orders MO', tend to reduce the impact of later events of the same sign (i.e. a buy MO' following 
a buy MO') but increase the impact of later events of the opposite sign. As stressed in Refs. [a, 0, 0' this history 
dependent asymmetric liquidity is the dominant effect that mitigates persistent trends in prices which would otherwise 
be induced by the long-ranged correlation in the sign of market orders. 

In spite of these enticing features, we have found that the HDIM leads to a worse determination of the price diffusion 
properties than the TIM. The almost perfect agreement between the TIM prediction and empirical data is perhaps 
accidental, but it may also be that TIMs (that have less parameters) are numerically more robust than HDIMs. For 
HDIMs, a more accurate calibration procedure is needed. This could be achieved either by finding a better, workable 
approximation for the three-point correlation function, or by using a purely numerical approach based on a historical 
simulation of the HDIM. On the other hand, some effects have been explicitly neglected, such as the role of unobserved 
events deeper in the book and on other platforms, or possible non-linearities in the history dependence of gaps. It 
would be very interesting to investigate the relevance of these effects, and to come up either with a fully consistent 
version of HDIM, or with a convincing argument for why the TIM appears to be particularly successful. 

In any case, we hope that the intuitive and versatile framework that we proposed above, together with operational 
calibration procedures, will help making sense of the highly complex and intertwined sequences of events that take 
place in the order books, and allows one to build a comprehensive theory of price formation in electronic markets. 
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Appendix A: Expression of the price diffusion for the TIM and HDIM 

We give here the rather ugly looking explicit expressions for the diffusion curve D{£) in both models. For the TIM, 
one gets as an exact expression: 



Dii) ^ Doe+ J2 ^G,i(^-^)'^(^i) + ^^[G.i(^ + ")-G^i(")]'^(^i) 

n<n<l TTi ri>0 TTi 

+ 2 J2 ^ G„i(^-")G.2(^-'^')C-i.-2("'-^) 

0<n<n' <l 7ri,7r2 

+ 2 Y. ^ [G.,(^ + n) -G,,(n)][G,,(£ + n')-G,,(n')] Ci ,.,(«- n') 

0<?i<ri'<^ iri.iT2 

+ 2 E E I] G,i(^-")[G.,(^ + n')-G,,(n')]a„^,(n' + n). (Al) 



0<n<ln'>0 7ri,7r2 
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where Do is the variance of the noise term rjt . 

For the HDIM, on the other hand, one has to use a factorization approximation to compute 3- and 4-point correlation 
functions in terms of 2-point correlations. One can finally estimate the price diffusion constant, which is given by the 



following approximate equation 11 1 



Dii)^{{pt+,-ptr)^Doi+ J2 EE^(^i)^(^2)C.,,.,(t'-t")A^^A^ 



R 



-f<t<f 7r2,-!r3 T>0 

E E E (^-|*lXt.4(^>^'>0C.2,.4(r-r' + i)P(7r2)PM, (A2) 

-l<t<l 7r2,7r4 t,t'>0 

where Dq is again the variance of the noise rjt , 

4,.^,{T,t) =Y.^^„.Ar)[I{t = 0)/(7ri = ^3) +/(i y^ 0)P(7ri) +/(i = -r)P(7ri)n,,,, (r)], (A3) 

and. for t > 0, 

<t-4(^'^''*) = E '^-2,-i(t)^.4,-3(^'){^(^ = ^')^(^i = 7r4)P(7r3) + 

/(< ^ T')P(^i)P(K3)[n.„.3 W + 1]}, (A4) 

whereas for i < 0, we use K++^^(r, t', — t) = k^^^^{t' ,T,t). We also introduced a correlation function between event 
types as |ll| : 

rr te\ _ PJT^t+e = TTzlTTf = tti) , _ {IJTTt = ni)I{nt+i = tts)) 

P(7r2) P(7ri)P(7r2) 
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